# **Single Convolution Layer Vitis Kernel Results:**

| Timing Information<br>Compute Unit     | (MHz)<br>Kernel Name | Module Name                                              | Target        | Frequ       | iency | Estim          | ated Fr | equency      |                |                        |                        |                        |
|----------------------------------------|----------------------|----------------------------------------------------------|---------------|-------------|-------|----------------|---------|--------------|----------------|------------------------|------------------------|------------------------|
| computePointHLS_1<br>computePointHLS_1 |                      | computePointHLS_Pipeline_VITIS_LOOP_17_1 computePointHLS | 300.30        |             |       | 411.0<br>411.0 |         |              |                |                        |                        |                        |
| Latency Informatio<br>Compute Unit     | n<br>Kernel Name     | Module Name                                              | Start         | Interv      | al I  | Best (c        | ycles)  | Avg (cycles) | Worst (cycles) | Best (absolute)        | Avg (absolute)         | Worst (absolute)       |
| computePointHLS_1<br>computePointHLS_1 |                      | computePointHLS_Pipeline_VITIS_LOOP_17_1 computePointHLS | 8883<br>9108  |             |       | 8883<br>9107   |         | 8883<br>9107 | 8883<br>9107   | 29.607 us<br>30.354 us | 29.607 us<br>30.354 us | 29.607 us<br>30.354 us |
| Area Information<br>Compute Unit       | Kernel Name          | Module Name                                              | FF            | LUT         | DSP   | BRAM           | URAM    |              |                |                        |                        |                        |
| computePointHLS_1<br>computePointHLS_1 |                      |                                                          | 8717<br>11733 | 660<br>5215 | 0     | 0              | 0       |              |                |                        |                        |                        |

## 1 Convolution estimated 30.354uS

To run full layer: 100352 \* 30.354 = 3.046 seconds



## Resources

LUT: 4,516 (0.86 %) BRAM: 10 (1.02 %)

URAM: 0 (N/A) Register: 7,990 (1.13 %) DSP: 5 (0.25 %)

| Name                  | LUT    | LUTAsMem | REG    | BRAM | URAM | DSP  |
|-----------------------|--------|----------|--------|------|------|------|
| Platform              | 146550 | 9581     | 202115 | 249  | 12   | 9    |
| ∨ User Budget         | 376170 | 151699   | 843325 | 735  | 116  | 1959 |
| Used Resources        | 4516   | 1005     | 7990   | 10   | 0    | 5    |
| Unused Resources      | 371654 | 150694   | 835335 | 725  | 116  | 1954 |
| ∨ computePointHLS (1) | 4516   | 1005     | 7990   | 10   | 0    | 5    |
| computePointHLS_1     | 4516   | 1005     | 7990   | 10   | 0    | 5    |

# Kernel Usage

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 151066      | 522720    | 28.90         |
| LUTRAM   | 10586       | 161280    | 6.56          |
| FF       | 210100      | 1045440   | 20.10         |
| BRAM     | 260         | 984       | 26.42         |
| URAM     | 12          | 128       | 9.38          |
| DSP      | 14          | 1968      | 0.71          |
| 10       | 151         | 516       | 29.26         |
| GT       | 8           | 28        | 28.57         |
| BUFG     | 43          | 940       | 4.57          |
| ммсм     | 4           | 11        | 36.36         |
| PLL      | 6           | 22        | 27.27         |
| PCIe     | 2           | 5         | 40.00         |

## **Device Utilization**

## **Original Convolution Layer Kernel Call:**

Data Input: Array of 800, all value 1 Data Weight: Array of 800, all value 2

Data Bias: scalar 3 Data Out In: 0

Expected Output: = 800\*1\*2 + 3 = 1603

Device[0]: program successful!

Doing matrix convolution w/ P2P read/write 

INFO: Successfully opened NVME SSD /dev/nvme0n1p1

Generating Test Input Values

Host in data BEFORE Host pwrite: 1.000000

Writing initial data to SSD

SSD in data AFTER Host pread: 1.000000

Called p2p read write()

Map NVME buffer to host access pointers

Now start P2P Read from SSD to FPGA DRAM

FPGA DRAM data BEFORE SSD pread: -nan

FPGA DRAM in data: -nan FPGA DRAM out init: -nan FPGA DRAM in weight: -nan FPGA DRAM in bias: 0.000000

Bytes read from in data to FPGA: 4096 FPGA DRAM data AFTER SSD pread: 1.000000

FPGA DRAM in data: 1.000000 FPGA DRAM out init: 0.000000 FPGA DRAM in weight: 2.000000 FPGA DRAM in bias: 3.000000

Now start P2P write from device memory to SSD

Expected Output: 1603.000000 Actual Output: 1603.000000

TEST COMPLETE

# **Original Convolution Point Kernel Call Timing Results:**

1 block = 4KB

Use seperate block for each array input. I think I can reduce this by combining blocks

Host to SSD Write (4 blocks): 221 us SSD to FPGA writes (4 blocks): 185 us

Kernel run: 236 us

FPGA to SSD write (1 block): 89 us SSD to Host write (1 block): 114 us

# **Original Convolution Layer Kernel Call Timing Results:**

Repeat for full layer, for 100352 iterations.

Host to SSD Write (4 blocks): 221 us SSD to FPGA writes (4 blocks): 185 us

Kernel run: 236 us

FPGA to SSD write (1 block): 89 us SSD to Host write (1 block): 114 us

Full Layer: 32.417 seconds

# Reducing write size for kernel result

Before this, used 4 blocks of 1024 entries, 4KB eachs for each input argument and 1 output argument.

Number of float values able to be written/read from pwrite/pread:

- 1024
- 512
- 256
- 128

Got a WRITE FAIL when using 64 float values or less.

Changed output write array size to 128, reduced writeback time by 50us. NO CHANGES TO KERNEL

# **Optimizing Kernel Pipelining**

Set pipeline stride with HLS pragma: #pragma HLS pipeline II=stride

#### Original Cycle Report:



### Cycle Report, Stride 4:



### Cycle Report, Stride 8:



## Cycle Report, Stride 16 (lowest without bandwidth issue ):



Once above stride 16, bus doesnt have enough ports

Note: Got same timing results for strides 4 and 16. Think I reached max improvement with this change

# Setting separate m\_axi ports for two read arrays

Reduced execution cycle time of kernel by half, can now read ports in parallel

#### Stride 16:



#### Stride 32:



## Stride 64:



Took longer, Stride 32 optimal

No improvement????

# Multiple Iterations per kernel call

Less data reads/writes to FPGA

Did 40 points per call first, got down to 1.5 seconds

Tried 90 next

Then did 800

800 was largest size I could use, this achieved time lower than baseline!

## **Final Synthesis Results**

```
//More points per kernel call, 800 iterations per point, 100 points
void computePointHLS(float* dataIn, float* layerWeight, float* dataOut final) {
                                                            bundle=gmem0
#pragma HLS interface mode=m axi
                                       port=dataIn
#pragma HLS interface mode=m axi
                                       port=layerWeight
                                                            bundle=gmem1
#pragma HLS interface mode=m axi
                                       port=dataOut final bundle=gmem0
float temp_add[STRIDE];
for(int point = 0; point < NUM POINTS; point++){</pre>
    for(int i = 0; i < STRIDE; i++){</pre>
            temp add[i] = 0;
    }
    //Compute
    for (int i = 0; i < POINT_SIZE; i += STRIDE) {</pre>
        #pragma HLS pipeline
        for(int j=0; j<STRIDE; j++){</pre>
            temp_add[j] += dataIn[POINT_SIZE*point + i+j] * layerWeight[POINT_SIZE*point + i+j];
    }
    for(int i=1; i<STRIDE; i++){</pre>
    #pragma HLS unroll
        temp add[0] += temp add[i];
    //Store dataOut data
    dataOut final[point] = temp add[0];
}
```



| ∨ 🚡 Accelerator (9)            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| computePointHLS (1)            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| Performance (1)                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| 4 AUTO-FREQ-SCALING-04         | One or more timing paths failed timing requirements. The kernel clock <u>blp_s_aclk_kernel_ref_clk_00</u> has an original frequency equal to 300.000000 MHz. The frequency has been automatically changed to 292.7 MHz to enable proper functionality. The clock ld is 0.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| ∨ 듢 computePointHLS (8)        | Open HLS project for computePointHLS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| <ul><li>Latency</li></ul>      | Cannot flatten loop "VITIS_LOOP_102_1' (Compute_HLS.cpp:102) in function 'computePointHLS' more than one sub loop.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| Throughput (7)                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| Throughput                     | The II Violation in module 'computePointHLS, Pipeline MTIS_LOOP_109_3' (loop VITIS_LOOP_109_3'): Unable to enforce a carried dependence constraint = 1, distance = 1, offset = 0) between 'store' operation ('add256 write  n109', <a href="Compute_HLS.cpp:109">Compute_HLS.cpp:109</a> ) of variable 'add256' and 'load' operation ('add256 load', <a href="Compute_HLS.cpp:113">Compute_HLS.cpp:113</a> ) on local variable 'add256'.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| <ul><li>Throughput</li></ul>   | The II Violation in module 'computePointHLS_Pipeline_VTIS_LOOP_109_3' (loop VTIS_LOOP_109_3'). Unable to enforce a carried dependence constraint = 2. distance = 1, offset = 0) between 'store' operation ('add256' write  n109', <a href="Compute HLS.cpp:109">Compute HLS.cpp:109</a> ) of variable 'add256' and 'load' operation ('add256 load', <a href="Compute HLS.cpp:113">Compute HLS.cpp:113</a> ) on local variable 'add256' and 'load' operation ('add256 load', <a href="Compute HLS.cpp:113">Compute HLS.cpp:113</a> ) on local variable 'add256'.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| <ul><li>Throughput</li></ul>   | The II Violation in module 'computePointHLS_Pipeline_VTIS_LOOP_109_3' (loop VTIS_LOOP_109_3'): Unable to enforce a carried dependence constraint = 3, distance = 1, offset = 0) between 'store' operation ('add256 write In109', <u>compute HLS.cpp:109</u> ) of variable 'add256' and 'load' operation ('add256 load' load', <u>compute HLS.cpp:113</u> ) on local variable 'add256' and 'load' operation ('add256 load') and 'compute HLS.cpp:113) on local variable 'add256'.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| <ul><li>Throughput</li></ul>   | The II Violation in module 'computePointHLS. Pipeline_VTIS_LOOP_109_3'' (loop VTIS_LOOP_109_3'): Unable to enforce a carried dependence constraint = 4, distance = 1, offset = 0) between 'store' operation ('add256 write  n109', <u>Compute HLS.cpp:109</u> ) of variable 'add256' and 'load' operation ('add256 load', <u>Compute HLS.cpp:113</u> ) on local variable 'add256' and 'load' operation ('add256 load', <u>Compute HLS.cpp:113</u> ) on local variable 'add256'.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| <ul><li>1 Throughput</li></ul> | The II Violation in module "computePointHLS Pipeline MTIS_LOOP_109_3" (loop MTIS_LOOP_109_3"): Unable to enforce a carried dependence constraint = 7. distance = 1, offset = 0) between "store" operation ("add256" write  n109", <a href="Compute_HLS.cpp:109">Compute_HLS.cpp:109</a> ) of oral variable "add256" and "load" operation ("add256" load", <a href="Compute_HLS.cpp:113">Compute_HLS.cpp:113</a> ) on local variable "add256" and "load" operation ("add256" load", <a href="Compute_HLS.cpp:113">Compute_HLS.cpp:113</a> ) on local variable "add256" and "load" operation ("add256" load", <a href="Compute_HLS.cpp:113">Compute_HLS.cpp:113</a> ) on local variable "add256" and "load" operation ("add256" load", <a href="Compute_HLS.cpp:113">Compute_HLS.cpp:113</a> ) on local variable "add256" and "load" operation ("add256" load", <a href="Compute_HLS.cpp:113">Compute_HLS.cpp:113</a> ) on local variable "add256" and "load" operation ("add256" load", <a href="Compute_HLS.cpp:113">Compute_HLS.cpp:113</a> ) on local variable "add256" and "load" operation ("add256" load", <a href="Compute_HLS.cpp:113">Compute_HLS.cpp:113</a> ) on local variable "add256" and "load" operation ("add256" load", <a href="Compute_HLS.cpp:113">Compute_HLS.cpp:113</a> ) on local variable "add256" and "load" operation ("add256" load", <a href="Compute_HLS.cpp:113">Compute_HLS.cpp:113</a> ) on local variable "add256" and "load" operation ("add256" load") and "load" operation ("add256" load") and "add256" load" load" load ("add256" load") and "add256" load" load ("add256" load") and "add256" load" load ("add256" load") and "add256" load" load ("add256" load") |
| <ul><li>Throughput</li></ul>   | The II Violation in module 'computePointHLS Pipeline_VTIS_LOOP_109_3' (loop VTIS_LOOP_109_3'): Unable to enforce a carried dependence constraint = 9, distance = 1, offset = 0) between 'store' operation ('add256 write  n109', <u>Compute HLS.cpp:109</u> ) of variable 'add256' and 'load' operation ('add256 load', <u>Compute HLS.cpp:113</u> ) on local variable 'add256' and 'load' operation ('add256 load', <u>Compute HLS.cpp:113</u> ) on local variable 'add256'.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| O Throughput                   | The II Violation in module 'computePointHLS_Pipeline_VTIS_LOOP_109_3' (loop 'VITIS_LOOP_109_3'): Unable to enforce a carried dependence constraint = 10, distance = 1, offset = 0) between 'store' operation ('add256_write_in109', Compute_HLS.cpp:109) of variable 'add', Compute_HLS.cpp:113 on local variable 'add256' and 'load' operation ('add256_load', Compute_HLS.cpp:113) on local variable 'add256'.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |

| Timing Information<br>Compute Unit<br>computePointHLS_1<br>computePointHLS_1<br>computePointHLS_1                                         | (MHz) Kernel Name computePointHLS computePointHLS computePointHLS | Module Name<br> | tHLS_Pipel                            |                   |                  | <br>2 300.3<br>3 300.3                | t Frequ<br><br>300293<br>300293<br>300293 | í -<br>4<br>3                  | stimat<br>94.559<br>70.096           | 814<br>252              | quency                  |
|-------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|-----------------|---------------------------------------|-------------------|------------------|---------------------------------------|-------------------------------------------|--------------------------------|--------------------------------------|-------------------------|-------------------------|
| Latency Information Compute Unit Kernel Name computePointHLS_1 computePoint computePointHLS_1 computePoint computePointHLS_1 computePoint | tHLS computePointHLS_Pipeli                                       |                 | Start Interval<br>34<br>296<br>708748 | Best (cycles)<br> | Avg (cycles)<br> | Worst (cycles)<br>34<br>296<br>708747 | 0.113 us<br>0.987 us<br>2.362 ms          | 0.1                            | (absolute)<br>3 us<br>37 us<br>52 ms | Worst (a<br>            |                         |
| Area Information Compute Unit computePointHLS_1 computePointHLS_1 computePointHLS_1                                                       | computePointHL                                                    | S computePo     | ointHLS_P:<br>ointHLS_P:              |                   |                  |                                       | FF<br><br>8<br>5431<br>11520              | LUT<br><br>50<br>2353<br>11508 | DSP<br><br>0<br>0                    | BRAM<br><br>0<br>0<br>2 | URAM<br><br>0<br>0<br>0 |

# T Kernel Synthesis Utilization

| _ |   | 0.4  |
|---|---|------|
| _ |   | 9/   |
| - | • | / /( |
|   |   |      |

| Name                | LUT    | LUTAsMem | REG    | BRAM | URAM | DSP  |
|---------------------|--------|----------|--------|------|------|------|
| Platform            | 147967 | 9990     | 205724 | 257  | 12   | 9    |
| ∨ User Budget       | 374753 | 151290   | 839716 | 727  | 116  | 1959 |
| Used Resources      | 9715   | 1199     | 13127  | 24   | 0    | 15   |
| Unused Resources    | 365038 | 150091   | 826589 | 703  | 116  | 1944 |
| computePointHLS (1) | 9715   | 1199     | 13127  | 24   | 0    | 15   |
| computePointHLS_1   | 9715   | 1199     | 13127  | 24   | 0    | 15   |

# Kernel Synthesis Utilization







| Name                | LUT     | LUTAsMem | REG     | BRAM    | URAM    | DSP     |
|---------------------|---------|----------|---------|---------|---------|---------|
| Platform            | 28.31%  | 6.19%    | 19.68%  | 26.12%  | 9.38%   | 0.46%   |
| ∨ User Budget       | 100.00% | 100.00%  | 100.00% | 100.00% | 100.00% | 100.00% |
| Used Resources      | 2.59%   | 0.79%    | 1.56%   | 3.30%   | 0.00%   | 0.77%   |
| Unused Resources    | 97.41%  | 99.21%   | 98.44%  | 96.70%  | 100.00% | 99.23%  |
| computePointHLS (1) | 2.59%   | 0.79%    | 1.56%   | 3.30%   | 0.00%   | 0.77%   |
| computePointHLS_1   | 2.59%   | 0.79%    | 1.56%   | 3.30%   | 0.00%   | 0.77%   |



| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 139074      | 522624    | 26.61         |
| LUTRAM   | 10569       | 161264    | 6.55          |
| FF       | 210693      | 1045440   | 20.15         |
| BRAM     | 281         | 984       | 28.56         |
| URAM     | 12          | 128       | 9.38          |
| DSP      | 24          | 1968      | 1.22          |
| 10       | 151         | 516       | 29.26         |
| GT       | 8           | 28        | 28.57         |
| BUFG     | 36          | 940       | 3.83          |
| ммсм     | 4           | 11        | 36.36         |
| PLL      | 6           | 22        | 27.27         |
| PCIe     | 2           | 5         | 40.00         |

| Name                                          | Issue Type   | Latency (cycles) | Latency (ns) | Iteration Latency | Interval | Trip Count | Pipelined |
|-----------------------------------------------|--------------|------------------|--------------|-------------------|----------|------------|-----------|
| → ComputePointHLS                             |              | 708747           | 2.362E6      |                   | 708748   |            | no        |
| ∨ C VITIS_LOOP_102_1                          |              | 708608           | 2.362E6      | 692               |          | 1024       | no        |
| ✓ ■ computePointHLS_Pipeline_VITIS_LOOP_104_2 |              | 34               | 113.000      |                   | 34       |            | no        |
| C VITIS_LOOP_104_2                            |              | 32               | 107.000      | 1                 | 1        | 32         | yes       |
| ✓ ■ computePointHLS_Pipeline_VITIS_LOOP_109_3 | II Violation | 296              | 987.000      |                   | 296      |            | no        |
| C VITIS_LOOP_109_3                            |              | 294              | 980.000      | 31                | 11       | 25         | yes       |

#### M\_AXI

| Interface   | Data Width (SW->HW) | Address Width | Latency | Offset | Register | Max Widen Bitwidth |
|-------------|---------------------|---------------|---------|--------|----------|--------------------|
| m_axi_gmem0 | 32 -> 512           | 64            | 64      | slave  | 0        | 512                |
| m_axi_gmem1 | 32 -> 512           | 64            | 64      | slave  | 0        | 512                |

# S\_AXILITE INTERFACES

| Interface     | Data Width | Address Width | Offset | Register |
|---------------|------------|---------------|--------|----------|
| s_axi_control | 32         | 6             | 16     | 0        |

## INFERRED BURST SUMMARY

| HW Interface | Loop             | Direction | Length | Width | Location               |
|--------------|------------------|-----------|--------|-------|------------------------|
| m_axi_gmem0  | VITIS_LOOP_102_1 | read      | 51200  | 512   | Compute_HLS.cpp:102:20 |
| m_axi_gmem1  | VITIS_LOOP_102_1 | read      | 51200  | 512   | Compute_HLS.cpp:102:20 |
| m_axi_gmem0  |                  | write     | 64     | 512   | Compute_HLS.cpp:102:20 |

#### INFERRED BURSTS AND WIDENING MISSED

| HW Interface | Variable    | Loop             | Problem                                                                                                  |
|--------------|-------------|------------------|----------------------------------------------------------------------------------------------------------|
| m_axi_gmem0  | datain      | VITIS_LOOP_109_3 | Could not widen since type i512 size is greater than or equal to the max_widen_bitwidth threshold of 512 |
| m_axi_gmem1  | layerWeight | VITIS_LOOP_109_3 | Could not widen since type i512 size is greater than or equal to the max_widen_bitwidth threshold of 512 |